Information Term Selection for Automatic Query Expansion
نویسندگان
چکیده
Techniques for query expansion from top retrieved documents have been recently used by many groups at TREC, often on a purely empirical ground. In this paper we present a novel method for ranking and weighting expansion terms. The method is based on the concept of relative entropy, or Kullback-Lieber distance, developed in Information Theory, from which we derive a computationally simple and theoretically justified formula to assign scores to candidate expansion terms. This method has been incorporated into a comprehensive prototype ranking system, tested in the ad hoc track of TREC-7. The system’s overall performance was comparable to median performance of TREC-7 participants, wich is quite good considering that we are new to TREC and that we used unsophisticated indexing and weighting techniques. More focused experiments showed that the use of an information-theoretic component for query expansion significantly improved mean retrieval effectiveness over unexpanded query, yielding performance gains as high as 14% (for non interpolated average precision), while a per-query analysis suggested that queries that are neither too difficult nor too easy can be more easily improved upon.
منابع مشابه
Comparison of Global Term Expansion Methods for Text Retrieval
This paper describes our work at the fifth NTCIR workshop on the subtasks of single language information retrieval (SLIR). Several automatic global query expansion strategies were explored based on a machine-derive thesaurus. These term selection strategies were compared with manual selection and local expansion. Experiments show that all the global expansion strategies perform worse than the s...
متن کاملQEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches
A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...
متن کاملEvolving Term-Selection Schemes for Pseudo-Relevance Feedback in Information Retrieval
Automatic query expansion in Information Retrieval aims to improve retrieval performance by overcoming the problem of term mismatch between a query and its relevant documents. Pseudorelevance (blind) feedback techniques have been shown to be of benefit on large TREC collections in recent years. This technique analyses terms in the top few documents deemed relevant by the system, reformulates th...
متن کاملQuery Expansion Using Term Distribution and Term Association
Good term selection is an important issue for an automatic query expansion (AQE) technique. AQE techniques that select expansion terms from the target corpus usually do so in one of two ways. Distribution based term selection compares the distribution of a term in the (pseudo) relevant documents with that in the whole corpus / random distribution. Two well-known distribution-based methods are b...
متن کاملInformative term selection for automatic query expansion
Techniques for query expansion from top retrieved documents have been recently used by many groups at TREC, often on a purely empirical ground. In this paper we present a novel method for ranking and weighting expansion terms. The method is based on the concept of relative entropy, or Kullback-Lieber distance, developed in Information Theory, from which we derive a computationally simple and th...
متن کاملRelevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach
Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated fo...
متن کامل